A Simple Evaluation Model for Feature Subset Selection Algorithms

نویسندگان

  • Huei Diana Lee
  • Maria Carolina Monard
  • Richardson Floriani Voltolini
  • Ronaldo C. Prati
  • Feng Chung Wu
چکیده

The aim of Feature Subset Selection – FSS – algorithms is to select a subset of features from the original set of features that describes a data set according to some importance criterion. To accomplish this task, FSS removes irrelevant and/or redundant features, as they may decrease data quality and reduce several of the desired properties of classifiers induced by supervised learning algorithms. As learning the best subset of features is an NP-hard problem, FSS algorithms generally use heuristics to select subsets. Therefore, it is important to empirically evaluate the performance of these algorithms. However, this evaluation needs to be multicriteria, i.e., it should take into account several properties. This work describes a simple model we have proposed to evaluate FSS algorithms which considers two properties, namely the predictive performance of the classifier induced using the subset of features selected by different FSS algorithms, as well as the reduction in the number of features. Another multicriteria performance evaluation model based on rankings, which makes it possible to consider any number of properties is also presented. The models are illustrated by their application to four well known FSS algorithms and two versions of a new FSS algorithm we have developed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Online Streaming Feature Selection Using Geometric Series of the Adjacency Matrix of Features

Feature Selection (FS) is an important pre-processing step in machine learning and data mining. All the traditional feature selection methods assume that the entire feature space is available from the beginning. However, online streaming features (OSF) are an integral part of many real-world applications. In OSF, the number of training examples is fixed while the number of features grows with t...

متن کامل

تعیین ماشین‌های بردار پشتیبان بهینه در طبقه‌بندی تصاویر فرا طیفی بر مبنای الگوریتم ژنتیک

Hyper spectral remote sensing imagery, due to its rich source of spectral information provides an efficient tool for ground classifications in complex geographical areas with similar classes. Referring to robustness of Support Vector Machines (SVMs) in high dimensional space, they are efficient tool for classification of hyper spectral imagery. However, there are two optimization issues which s...

متن کامل

Improvement of effort estimation accuracy in software projects using a feature selection approach

In recent years, utilization of feature selection techniques has become an essential requirement for processing and model construction in different scientific areas. In the field of software project effort estimation, the need to apply dimensionality reduction and feature selection methods has become an inevitable demand. The high volumes of data, costs, and time necessary for gathering data , ...

متن کامل

A Parallel Genetic Algorithm Based Method for Feature Subset Selection in Intrusion Detection Systems

Intrusion detection systems are designed to provide security in computer networks, so that if the attacker crosses other security devices, they can detect and prevent the attack process. One of the most essential challenges in designing these systems is the so called curse of dimensionality. Therefore, in order to obtain satisfactory performance in these systems we have to take advantage of app...

متن کامل

IFSB-ReliefF: A New Instance and Feature Selection Algorithm Based on ReliefF

Increasing the use of Internet and some phenomena such as sensor networks has led to an unnecessary increasing the volume of information. Though it has many benefits, it causes problems such as storage space requirements and better processors, as well as data refinement to remove unnecessary data. Data reduction methods provide ways to select useful data from a large amount of duplicate, incomp...

متن کامل

A Parallel Genetic Algorithm Based Method for Feature Subset Selection in Intrusion Detection Systems

Intrusion detection systems are designed to provide security in computer networks, so that if the attacker crosses other security devices, they can detect and prevent the attack process. One of the most essential challenges in designing these systems is the so called curse of dimensionality. Therefore, in order to obtain satisfactory performance in these systems we have to take advantage of app...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Inteligencia Artificial, Revista Iberoamericana de Inteligencia Artificial

دوره 10  شماره 

صفحات  -

تاریخ انتشار 2006